Analyzing risk behaviors of youth¶Introduction-¶Risky behaviors are acts that increase the risk of disease or injury, which could eventually threaten health or even life. Especially, potentially risky behaviors that youths engage in will impact their well-being and life prospects (Gruber, 2001). Activities such as smoking, consuming alcohol, having sex, and taking drugs will cause consequences for the remainder of their lives. Therefore, our group would like to gain a better understanding of youth risk behavior patterns and draw insights that can help teenagers create lifelong healthy behaviors. Our project builds a surveillance system that analyzes three categories of health risk behavior among the youth:
We obtained a data set, which contains 2,740,200 rows and 35 columns of observations, from Kaggle. It is about a Youth Risk Behavior Surveillance System (YRBSS) that conducts surveys to collect information from high school students in terms of adverse health behaviors in over 100 schools in the United States. The survey data ranges from 1991 to 2017 and associates a risk percentage to specific health-related issues over various demographic categories such as race, grade, sex, and location.
Based on the data set, we come up with some meaningful questions and would like to explore more:
Is there an increase in tobacco, alcohol, and other drug use among youth over the years in different regions and states of the United States of America?
(We aim to determine if drug use behaviors including consuming alcohol, marijuana, heroin, or ecstasy increased over years in different states across the US. We look for the change of trend in drug use behaviors and how it differs in terms of the type of drug, years, and states.)
Which student demographics are susceptible to different substance abuse and sexual behaviors?
(We want to check if there is any relationship between different demographics and risky behaviors. We would like to see how sex, grade, race, and location may influence the youth's tendency to engage in drug use and sexual behaviors. )
Does excessive alcohol/drug abuse lead to higher chances of sexually transmitted diseases among high school students?
(We plan to determine whether the alcohol or drug use behavior will cause a higher likelihood of engaging in sexual behaviors which may further increase sexually transmitted diseases among youth. We try to determine whether there is a relationship between those risky behaviors and if yes what type of correlation coefficient exists.)
By analyzing the dataset, we will be able to find some helpful relationships between behaviors and experiences among high school students and figure out the change patterns of risky behaviors over time and place. Based on that, we can provide some insights and suggestions on how to enhance the current law enforcement and improve legal regulations to prevent high school students from engaging in risky behaviors. Furthermore, our system will prove beneficial for giving guidance to various non-profit organizations on helping and protecting at-risk young adults and teenagers in the United States.
Choice for Heavier Grading on Data Processing-¶Our group make the decision that our project should be graded more heavily on data processing. The reason why we believe the work we did goes above and beyond the basic data processing needed for most data sets is that our data set has three topics in separate sheets and we have made a great effort to clean and process each sheet before merging them into one. We spent time understanding this huge data set before we drop any irrelevant information. Then we conducted the tasks such as recording, reindexing and changing data types to further process the data in each sheet.
Data Cleaning-¶Data Processing-¶Since our data set is under the main theme of youth risky behaviors, even though each sheet has its subtopics such as tobaccos use, sexual behavior, alcohol, and drug use, the format and columns are quite similar across sheets. Therefore, the methods that we used to clean and process data are similar for each sheet.
Cleaning
Transforming
Enhancing
Data Acquisition and Cleaning Code-¶For our project, we have accomplished data acquisition and data cleaning so far. We downloaded the dataset from Kaggle and obtained an overview of the data. Then we worked on understanding the meaning of each column and conducted data cleaning by removing irrelevant columns. Later, we get detailed information for each column such as statistics and unique values. We filter the data and drop null values. Finally, we reorder the columns in a way that makes more sense and is prepared for data processing.
Alcohol and drug use among youth in different states of the United States-¶We are now going to check if there is an increase in the youth risk behaviour due to alcohol and drug abuse in the different states of US. We are expecting an increase in the above mentioned risky behavior throughout the states as the access to drugs and alcohol has become easier over time. The states which we assumed would have the most youth risk behavior were Nevada and New York. Let us find out if our assumptions are supported by the data. We are going to segregate the data according the subtopics of Alcohol and other drugs usage. In the other drug usage subtopic we are choosing the greater risk responses related to marijuana, heroin and cocaine. We chose these three drugs because they are most commonly used drugs in the United States.
Before we move on to create a visualization which would help us analyse the data we will first process the data accurately in order for it to be suitable for creating visualizations.
Since we have processed the data which is suitable for our visualization we will proceed further. We will be using density plot in the Plotly library in python to create our visualization. We chose this visualization because it would help us understand which states are at the most risk in which year. Also we can clearly compare risk percentage between different states in different years and see a trend of risk behavior in all states throughout the years. The density plot is like a heatmap which will show us the intensity of the risk in different areas of the US.
#Using density plot to create a heatmap for Alcohol usage among youth in US for
#understanding the intensity of the risk in those areas
fig = px.density_mapbox(avg_Alcohol, lat='Latitude', lon='Longitude', radius=30, zoom=3,z='Greater_Risk_Data_Value',
color_continuous_scale= px.colors.sequential.Viridis,
mapbox_style="stamen-terrain", animation_frame='YEAR', hover_name='LocationAbbr')
fig.update_layout(
title = 'Alcohol Usage among youth in the different states of United States',
)
fig.show()
Observations-¶It can be seen here that the alcohol consumption is wide spread among youth in the United states. In the early 90s the alcohol consumption in youth was limited to some states but as we move ahead in time the consumption has been prevalant in various states, if not increasing. The risk percentage is reducing but the map is getting denser with time that could mean that the teenagers indulging in such activities are increasing but are not at high risk. The access to alcohol has become easier leading to an increase of youth participation in it. Although, the youth is just as much aware of the consequences of alcoholism.
If we compare the data from all states with one another the states on the East coast such as New York, New Jersey, Massachusetts, Maryland, Rhode Island and Denver are most prone to risk behavior resulting from alcohol. Overall for all states the intensity of the risk spiked in the year 2005 and continued until its decline after 2011. To be precise the risk value was consistently increasing after 2001.
The states on the west coast such as California and Nevada has consistent risk intensity of alcoholism throughout the years. But one exception can be seen in the region near San Francisco and Los Angeles. The alcohol risk percentage was relatively low but the density of the risk increased by a good major after the year 2013.
The mid west as a whole has seen an increase in alcoholic risk behavior among youth. The early 90s had less density of alcohol risk behavior but as the years passed more and more mid-western teenagers indulged in this activity. The most percentage of teens at risk in this region was recorded in the year 2007 with Illinois having 56.229% of teenagers at risk. Just like the east coast the density of teenagers participating in the alcohol began increasing after 2001.
In the southern states the alcohol risk behavior has been prevalant throughout the years. The most density of the risk activity was observed in 2011 especially in Louisana with over 48.585% of teenagers at risk. Florida has risk activity related to alcoholism more than the rest of the souhtern states.
#Using density plot to create a heatmap for Marijuana usage among youth in US for
#understanding the intensity of the risk in those areas
fig = px.density_mapbox(avg_Marijuana, lat='Latitude', lon='Longitude', radius=30, zoom=3,z='Greater_Risk_Data_Value',
color_continuous_scale= px.colors.sequential.Viridis,
mapbox_style="stamen-terrain", animation_frame='YEAR', hover_name='LocationAbbr')
fig.update_layout(
title = 'Marijuana Usage among youth in the different states of United States',
)
fig.show()
Observations-¶Again here we can see that the states on the east coast have higher risk of teenagers using marijuana. The states on the west coast such as California and Nevada also have a high risk percentage of Marijuana usage among the youth. With the range of risk percentage changing from 5-30% to 10-50% the overall trend shows that the density of the teenage population indulging in marijuana has increased as well as the percentage of youth being at risk from it.
The marijuana usage among youth is pretty much present in all the states but the states in the mountain prairie and South West region such as South and North Dakota, Nebraska and Arizona are at the least risk. Except for Florida, the South western states are also at less risk from youth behavior related to marijuana abuse.
In the mid-west Illionois has the highest risk of marijuana related risk behavior among youth followed by Wisconsin and Michigan. The marijuana risk in Wisconcin seems to be increasing after the year 2001, the highest risk percentage recorded fo the state was in 2011 with about 37.27%.
2011 is also the year when the risk values are the most dense over the entire United States.
#Using density plot to create a heatmap for Herion usage among youth in US for
#understanding the intensity of the risk in those areas
fig = px.density_mapbox(avg_Heroin, lat='Latitude', lon='Longitude', radius=30, zoom=3,z='Greater_Risk_Data_Value',
color_continuous_scale=px.colors.sequential.Viridis,
mapbox_style="stamen-terrain", animation_frame='YEAR', hover_name='LocationAbbr')
fig.update_layout(
title = 'Heroin Usage among youth in the different states of United States',)
fig.show()
Observations-¶Even though the percentage of youth at risk due to Heroin isn't high but the there has been an increase of the said percentage throughout the years. The increement can be seen in the year 2009 when the range of greater risk percentage went upto 10%. Later it seems to have decreased but it hasn't come down to the percentage of risk it was at in 1999.
The states which are most prone to heroin usage among youth are New York, New Jersey, Maryland, Connecticet, New Hampshire, Pennsylvania, Delaware, Lousisana, Florida and California.
The year with the most density of heroin youth risk was observed in 2011 but there was a reduction in the percentage of teenagers at risk.
It is also observed that Hawaii also has had a steady rise of risk since 2013.
In the recent years there has been a rise in the heroin usage in certain regions of the United States which is alarming but after 2015 the risk percentage is going back down.
#Using density plot to create a heatmap for Ecstasy usage among youth in US for
#understanding the intensity of the risk in those areas
fig = px.density_mapbox(avg_Cocaine, lat='Latitude', lon='Longitude', radius=30, zoom=3,z='Greater_Risk_Data_Value',
color_continuous_scale=px.colors.sequential.Viridis,
mapbox_style="stamen-terrain", animation_frame='YEAR', hover_name='LocationAbbr')
fig.update_layout(
title = 'Cocaine Usage among youth in the different states of United States')
fig.show()
Observations-¶According to the datapoints present in our database the above map has been plotted. Over the years it is observed that the cocaine usage among youth is going up and down. This could be due to inconsistent data for cocaine drug abuse among youth.
Inferences-¶After looking through the observations it can be concluded that the states on the East coast are the most suseptable to risk behaviors related to alcohol and drug abuse. This could be due to the following reasons:
Another region which has a significant risk of youth engaging in alcohol and drug abuse is Florida. The reasons for the same could be:
It was expected to find Nevada and California to high usage of marijuana usage among youth as marijuana was legalised in these states in 1996 and 2000 respectively.
The alcohol usage risk among youth was increasing since the start of 2001 to 2005 on the East Coast and Mid-West. The cause for the surge in the risk could be due to the 9/11 incident. Undoutedly it shocked the whole nation but the youth in these regions would have been most effected. Many of them might have lost either of their parents in the incident causing them absolute despair. As seen in this article there was an increase in mental health diseases such PTSD leading to more alcohol usage and substance abuse.
The organisations that would benefit from the data would be:
Which student demographics are susceptible to different substance abuse and sexual behaviors?¶Exploring the Dataset-¶Data Processing for Decision Tree-¶Decision Tree Implementation-¶Here, we are using Decision Tree so that with Decision Tree we can identify as in which student demographic is more suspectible to substance abuse irrespective of the year. So here, we are using DecisionTree for Regression since we are kind of finding what the values are for different decision, we are not doing any classification or so here, so we will not be using Decision Tree Classifier.
Further, here, we are also not dividing the dataset into training & test dataset as that is not required here, as we are simply trying to understand the path where most of the students would be most suspectible to student abuse, and we are not making any predictions or so here.
Further, we have several parameters like max_depth , min_samples_split & min_samples_leaf, so here we have chosen max_depth as 7, the more we increase the value of it, the more our tree will keep increasing, and the results with a pattern we were getting in max_depth of 7 was same as that of 20, so we chose 7.
Further, min_samples_split value that we have set here is 2, and the min_samples_leaf value that we have set here is 1, so if we see then what we did above was we grouped the data based on these values -> ["LocationAbbr", "Sex", "Race", "Grade"], so if we see that before grouping or the grouping that is happening is basically considering a lot of values and then taking an aggregation of it. So each of the sample viz in the dataframe after grouping, had itself 5 or 6 or > 1 values in their observations before grouping. So this is reason after trying out with multiple values we decided to choose the values of 2 & 1 for min_samples_split & min_samples_leaf
#change the depth to see which one might be better
#try 2, 5, 7, 10
regr_1 = DecisionTreeRegressor(max_depth=7,
min_samples_split=2,
min_samples_leaf=1
)
regr_1.fit(x, y)
#!pip install graphviz
DecisionTreeRegressor(max_depth=7)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
DecisionTreeRegressor(max_depth=7)
#visualizing the regressor made
# tree.export_graphviz(regr_2,
# feature_names = x.columns,
# filled = True)
graph = Source(tree.export_graphviz (regr_1,
out_file=None,
feature_names=x.columns,
filled=True))
# graph.format = 'png'
# graph.render('dtree_render',view=True)
png_bytes = graph.pipe(format='png')
with open('alchohol_dtree_7.png','wb') as f:
f.write(png_bytes)
from IPython.display import Image
Image(png_bytes)
#(value N1 + value N2)/2 = value of parent of N1 & N2
#(samples N1 + samples N2) = samples of parent of N1 & N2
Insights from the Alchohol Dataset & Steps to be Taken to Control the Alcohol Consumption Rise-¶It is observed from the Decision Tree above, that in state of Alaska ("AK") students who are in the 12th Grade and are Male, are more suspectible to higher Alchohol Consumption and Drug Abuse with a Risk Value of 33.445 %. Further, we observe that in the state of California ("CA") students in the 12th Grade and are Male are more suspectible to higher Alchohol Consumption and Drug Abuse with a Risk Value of 30.166 %.
The government of these states, can take the following strategies to control the rise of alchohol consumption-
#here, now create the decision tree -> tobacco dataset
x = tobacco_df_2.loc[:,['Sex', 'Grade', 'Race', 'LocationAbbr']]
ohe = pd.get_dummies(x[['Sex', 'Grade', 'Race', 'LocationAbbr']])
ohe.head()
x = pd.concat([x,ohe],axis=1)
x.head()
x.drop(columns=['Sex', 'Grade', 'Race', 'LocationAbbr'],inplace=True) #this is same as ohe variable
x.head()
y = tobacco_df_2.loc[:,'Greater_Risk_Data_Value']
y.head()
regr_1 = DecisionTreeRegressor(max_depth=7,
min_samples_split=2,
min_samples_leaf=1
)
regr_1.fit(x, y)
graph = Source(tree.export_graphviz (regr_1,
out_file=None,
feature_names=x.columns,
filled=True))
png_bytes = graph.pipe(format='png')
with open('tobacco_dtree_7.png','wb') as f:
f.write(png_bytes)
from IPython.display import Image
Image(png_bytes)
Insights from the Tobacco Dataset & Steps to be Taken to Control the Tobacco Consumption Rise-¶It is observed from the Decision Tree above, that in the state of Louisiana ("LA") students who are Female, White in Race, and are in 12th Grade, are more suspectible to higher Tobacco Consumption with a Risk Value of 51.991%, also the same trend is observed in Males too with the Risk Value being of 44.683%.
As per the sources here, we can say that Louisana spends less than 3% of tobacco revenue on Anti-Smoking programs, and it should focus more on its Anti-Smoking programs and this might increase awareness among the youth to reduce the tobacco consumption.
Also, we observe that in the state of Washington ("WA") students who are Male, "White" or "Native Hawaiian or Other Pacific Islander" in Race, and are not in 12th Grade (implying being either in 9th / 10th / 11th Grade), are more suspectible to higher Tobacco Consumption with a Risk Value of 53.168%.
As per the sources here, we can say that the Washington state government can take the following steps to prevent youth from using tobacco and avoid a lifetime of addiction among youths-
#here, now create the decision tree -> sexual behaviours
x = sexb_df_2.loc[:,['Sex', 'Grade', 'Race', 'LocationAbbr']]
ohe = pd.get_dummies(x[['Sex', 'Grade', 'Race', 'LocationAbbr']])
ohe.head()
x = pd.concat([x,ohe],axis=1)
x.head()
x.drop(columns=['Sex', 'Grade', 'Race', 'LocationAbbr'],inplace=True) #this is same as ohe variable
x.head()
y = sexb_df_2.loc[:,'Greater_Risk_Data_Value']
y.head()
regr_1 = DecisionTreeRegressor(max_depth=7,
min_samples_split=2,
min_samples_leaf=1
)
regr_1.fit(x, y)
graph = Source(tree.export_graphviz (regr_1,
out_file=None,
feature_names=x.columns,
filled=True))
png_bytes = graph.pipe(format='png')
with open('sexb_dtree_7.png','wb') as f:
f.write(png_bytes)
from IPython.display import Image
Image(png_bytes)
Insights from the Sexual Behaviours Dataset & Steps to be Taken to Control the Unhealthy Sexual Behaviour in Youths-¶Overall, it is observed from the Decision Tree above, that in the state of Utah (UT) students who are in any of the grades from 9th to 12th, and are of any of the genders be it Male or Female; are more suspectible to higher Unhealthy Sexual Behaviour with a Risk Value being of at least 88.085%.
As per the sources here, it seems that in Utah people are more conservative and it is due to this reason people engage in Unhealth Sexual Behaviours. As per CDC, there is insufficient evidence to show that there are not enough programs that promote abstinence and mention failure rates of condoms prevent STDs and pregnancy, and in Utah public school it is observed that it restricts the health educators to say about preventing these diseases, and State law says that they can't encourage people to use condoms and even the State Laws fear mentioning contraception; this clearly depicts conservatiness in the state.
How does the risky behavior among youth in different grades change over the years? Is there any significant pattern in risky behavior over the years for high school girls and boys?¶To answer the above question first we need to define what we can consider as high risk behavior. 'Greater_Risk_Data_Value' variable in our dataset gives the percentage of students who have answered positively to the greater risk question, if the the value is above 40% (considering this value because we need a value which is above the upper quartile in the observations) we are considering it as high risk behavior for that particular observation. To make our findings more concise we will group our data based on 1.(year, grade) and 2.(year, sex and grade) and find the count of observations we have for that particular group. Then we will filter the groups with 'Greater_Risk_Data_Value' above 40 percent and find that count. Based on the previous two counts calculated we can calculate the percentage of observations we have for high risk(greater than 40% risk value) for the particular group.
# Creating a dummy variable for observations where Greater_Risk_Data_Value is greater than 40
all_data['High_risk_behavior'] = np.where(all_data['Greater_Risk_Data_Value'] >=40, 1,0)
# Finding count of risk behavior observations by grouping data based on year, grade and gender
grouped_year_gender_grade = all_data.groupby(['YEAR', 'Sex','Grade']).count()
# Finding count of risk behavior data based on year and grade for both genders
grouped_year_total_grade = all_data.groupby(['YEAR', 'Grade']).count()
# Finding the number of observations where the risk behavior is greater than 40 percent for grouped data on year, grade and gender
grouped_high_risk_gender = all_data[all_data['High_risk_behavior'] == 1].groupby(['YEAR', 'Sex', 'Grade']).count()
# Finding the number of observations where the risk behavior is greater than 40 percent for grouped data on year, grade for both genders
grouped_high_risk_total = all_data[all_data['High_risk_behavior'] == 1].groupby(['YEAR', 'Grade']).count()
# Merging the counts we previously found for risk behaviors into a new dataframe
new_df_high_risk_gender = pd.merge(grouped_year_gender_grade['Subtopic'], grouped_high_risk_gender['Subtopic'], on=['YEAR', 'Sex', 'Grade'], how="left")
new_df_high_risk_total = pd.merge(grouped_year_total_grade['Subtopic'], grouped_high_risk_total['Subtopic'], on=['YEAR', 'Grade'], how="left")
# Calculating the percentage of observations where we found high risk behavior greater than 40 percent for both genders
new_df_high_risk_gender['High Risk Percentage'] = round((new_df_high_risk_gender['Subtopic_y']/new_df_high_risk_gender['Subtopic_x'] * 100),2)
new_df_high_risk_total['High Risk Percentage'] = round((new_df_high_risk_total['Subtopic_y']/new_df_high_risk_total['Subtopic_x'] * 100),2)
# Resetting the index and rename the Grades
new_df_high_risk_gender.reset_index(inplace = True)
new_df_high_risk_total.reset_index(inplace = True)
new_df_high_risk_gender = new_df_high_risk_gender.replace({'Grade' : { '9th' : 'Ninth Grade', '10th' : 'Tenth Grade', '11th' : 'Eleventh Grade', '12th':'Twelfth Grade' }})
new_df_high_risk_total = new_df_high_risk_total.replace({'Grade' : { '9th' : 'Ninth Grade', '10th' : 'Tenth Grade', '11th' : 'Eleventh Grade', '12th':'Twelfth Grade' }})
# Setting the original indexes for both the dataframes after renaming the grade columns
new_df_high_risk_gender = new_df_high_risk_gender.set_index(keys=['YEAR','Sex', 'Grade']).sort_index(level = [0,1,2])
new_df_high_risk_total = new_df_high_risk_total.set_index(keys=['YEAR','Grade']).sort_index(level = [0,1])
new_df_high_risk_gender.rename(columns = {'Subtopic_x':'Total Students Obs.','Subtopic_y': 'Student Obs. with High Risk' }, inplace = True)
new_df_high_risk_total.rename(columns = {'Subtopic_x':'Total Students Obs.','Subtopic_y': 'Student Obs. with High Risk' }, inplace = True)
Now, we have two dataframes :-
We have taken two of these dataframes so that we can create a visualization can show us a trend of risk behavior for all students, and also a trend of risk behavior individually for male and female high school students.
# Displaying the dataframe showing the counts for observations for high risk and percentage high risk for a particular grade for male and female students
new_df_high_risk_gender.head(20)
| Total Students Obs. | Student Obs. with High Risk | High Risk Percentage | |||
|---|---|---|---|---|---|
| YEAR | Sex | Grade | |||
| 1991 | Female | Eleventh Grade | 251 | 73 | 29.08 |
| Ninth Grade | 281 | 53 | 18.86 | ||
| Tenth Grade | 261 | 65 | 24.90 | ||
| Twelfth Grade | 222 | 65 | 29.28 | ||
| Male | Eleventh Grade | 228 | 72 | 31.58 | |
| Ninth Grade | 277 | 84 | 30.32 | ||
| Tenth Grade | 275 | 93 | 33.82 | ||
| Twelfth Grade | 212 | 72 | 33.96 | ||
| 1993 | Female | Eleventh Grade | 452 | 121 | 26.77 |
| Ninth Grade | 497 | 98 | 19.72 | ||
| Tenth Grade | 488 | 110 | 22.54 | ||
| Twelfth Grade | 434 | 130 | 29.95 | ||
| Male | Eleventh Grade | 402 | 123 | 30.60 | |
| Ninth Grade | 450 | 117 | 26.00 | ||
| Tenth Grade | 481 | 127 | 26.40 | ||
| Twelfth Grade | 408 | 136 | 33.33 | ||
| 1995 | Female | Eleventh Grade | 518 | 134 | 25.87 |
| Ninth Grade | 653 | 127 | 19.45 | ||
| Tenth Grade | 596 | 129 | 21.64 | ||
| Twelfth Grade | 470 | 138 | 29.36 |
# Displaying the dataframe showing the counts for observations for high risk and percentage high risk for a particular grade
new_df_high_risk_total.head(20)
| Total Students Obs. | Student Obs. with High Risk | High Risk Percentage | ||
|---|---|---|---|---|
| YEAR | Grade | |||
| 1991 | Eleventh Grade | 479 | 145 | 30.27 |
| Ninth Grade | 558 | 137 | 24.55 | |
| Tenth Grade | 536 | 158 | 29.48 | |
| Twelfth Grade | 434 | 137 | 31.57 | |
| 1993 | Eleventh Grade | 854 | 244 | 28.57 |
| Ninth Grade | 947 | 215 | 22.70 | |
| Tenth Grade | 969 | 237 | 24.46 | |
| Twelfth Grade | 842 | 266 | 31.59 | |
| 1995 | Eleventh Grade | 973 | 280 | 28.78 |
| Ninth Grade | 1164 | 249 | 21.39 | |
| Tenth Grade | 1113 | 284 | 25.52 | |
| Twelfth Grade | 875 | 271 | 30.97 | |
| 1997 | Eleventh Grade | 1314 | 383 | 29.15 |
| Ninth Grade | 1433 | 320 | 22.33 | |
| Tenth Grade | 1396 | 347 | 24.86 | |
| Twelfth Grade | 1150 | 372 | 32.35 | |
| 1999 | Eleventh Grade | 1282 | 298 | 23.24 |
| Ninth Grade | 1638 | 254 | 15.51 | |
| Tenth Grade | 1394 | 276 | 19.80 | |
| Twelfth Grade | 1157 | 321 | 27.74 |
Now, we will create three separate pivot tables that will contain the percentage of high risk observations for all high school students and further segregate it into males and females.
# Creating pivot table for with year and gender in rows and displaying high risk percentage for each of the grades for individual genders
pivot_high_risk_gender = new_df_high_risk_gender.pivot_table(index = ['YEAR','Sex'], columns = ['Grade'], values = 'High Risk Percentage')
# Creating pivot table for with year in rows and displaying high risk percentage for each of the grades
pivot_high_risk_total = new_df_high_risk_total.pivot_table(index = ['YEAR'], columns = ['Grade'], values = 'High Risk Percentage')
# Rearranging the grades in the sequence from night to twelfth standard
pivot_high_risk_gender = pivot_high_risk_gender[['Ninth Grade','Tenth Grade','Eleventh Grade','Twelfth Grade']]
pivot_high_risk_total = pivot_high_risk_total[['Ninth Grade','Tenth Grade','Eleventh Grade','Twelfth Grade']]
pd.set_option("mode.chained_assignment", None)
# Creating separate pivot table only for male and female students
pivot_high_risk_male = pivot_high_risk_gender[pivot_high_risk_gender.index.isin(['Male'], level=1)]
pivot_high_risk_female = pivot_high_risk_gender[pivot_high_risk_gender.index.isin(['Female'], level=1)]
pivot_high_risk_male = pivot_high_risk_male.reset_index()
pivot_high_risk_male = pivot_high_risk_male.drop(columns = 'Sex', axis = 1).set_index('YEAR')
pivot_high_risk_female.reset_index(inplace = True)
pivot_high_risk_female.drop(columns = 'Sex', axis = 1,inplace = True)
pivot_high_risk_female.set_index('YEAR', inplace = True)
# Displaying pivot for high risk percentage for all high school students in all grades
pivot_high_risk_total
| Grade | Ninth Grade | Tenth Grade | Eleventh Grade | Twelfth Grade |
|---|---|---|---|---|
| YEAR | ||||
| 1991 | 24.55 | 29.48 | 30.27 | 31.57 |
| 1993 | 22.70 | 24.46 | 28.57 | 31.59 |
| 1995 | 21.39 | 25.52 | 28.78 | 30.97 |
| 1997 | 22.33 | 24.86 | 29.15 | 32.35 |
| 1999 | 15.51 | 19.80 | 23.24 | 27.74 |
| 2001 | 13.69 | 16.10 | 21.32 | 25.76 |
| 2003 | 10.42 | 13.44 | 18.27 | 21.66 |
| 2005 | 9.07 | 13.25 | 17.62 | 22.20 |
| 2007 | 8.29 | 12.46 | 16.96 | 20.81 |
| 2009 | 7.11 | 10.36 | 16.43 | 18.85 |
| 2011 | 5.92 | 9.51 | 15.68 | 19.38 |
| 2013 | 6.08 | 9.73 | 15.31 | 20.82 |
| 2015 | 6.99 | 9.92 | 14.85 | 19.53 |
| 2017 | 4.84 | 7.14 | 10.96 | 15.24 |
# Displaying pivot showing high risk percentage for male high school students for all grades
pivot_high_risk_male
| Grade | Ninth Grade | Tenth Grade | Eleventh Grade | Twelfth Grade |
|---|---|---|---|---|
| YEAR | ||||
| 1991 | 30.32 | 33.82 | 31.58 | 33.96 |
| 1993 | 26.00 | 26.40 | 30.60 | 33.33 |
| 1995 | 23.87 | 29.98 | 32.09 | 32.84 |
| 1997 | 25.30 | 27.39 | 31.16 | 34.46 |
| 1999 | 17.10 | 22.71 | 23.93 | 30.74 |
| 2001 | 16.63 | 17.58 | 21.42 | 26.81 |
| 2003 | 11.42 | 14.01 | 18.60 | 23.38 |
| 2005 | 9.41 | 13.52 | 17.30 | 23.88 |
| 2007 | 8.32 | 12.96 | 16.78 | 22.67 |
| 2009 | 7.40 | 10.69 | 16.72 | 19.88 |
| 2011 | 6.33 | 10.21 | 15.88 | 19.98 |
| 2013 | 6.21 | 9.99 | 14.98 | 21.03 |
| 2015 | 6.87 | 10.12 | 14.64 | 19.30 |
| 2017 | 4.74 | 7.13 | 10.04 | 15.70 |
# Displaying pivot showing high risk percentage for female high school students for all grades
pivot_high_risk_female
| Grade | Ninth Grade | Tenth Grade | Eleventh Grade | Twelfth Grade |
|---|---|---|---|---|
| YEAR | ||||
| 1991 | 18.86 | 24.90 | 29.08 | 29.28 |
| 1993 | 19.72 | 22.54 | 26.77 | 29.95 |
| 1995 | 19.45 | 21.64 | 25.87 | 29.36 |
| 1997 | 19.68 | 22.45 | 27.39 | 30.53 |
| 1999 | 14.09 | 17.04 | 22.62 | 25.12 |
| 2001 | 10.99 | 14.79 | 21.24 | 24.83 |
| 2003 | 9.52 | 12.97 | 18.00 | 20.41 |
| 2005 | 8.74 | 13.00 | 17.91 | 20.72 |
| 2007 | 8.26 | 12.00 | 17.14 | 19.18 |
| 2009 | 6.85 | 10.06 | 16.19 | 18.00 |
| 2011 | 5.56 | 8.88 | 15.50 | 18.87 |
| 2013 | 5.97 | 9.50 | 15.61 | 20.63 |
| 2015 | 7.10 | 9.73 | 15.03 | 19.76 |
| 2017 | 4.94 | 7.14 | 11.84 | 14.78 |
Utilizing the pivot tables we previously created we can create a line chart that will show us the trend of high risk behavior across the years for all grades. Further, since we have created separate pivots for male and female students we can create a total of three line charts showing trend for male, female and all high school students.
# Creating a line graph to see trend for male students
line_graph_male = px.line(pivot_high_risk_male, x=pivot_high_risk_male.index, y=['Ninth Grade', 'Tenth Grade', 'Eleventh Grade', 'Twelfth Grade'], title="Youth Risk Behaviour")
# Creating a line graph to see trend for female students
line_graph_female = px.line(pivot_high_risk_female, x=pivot_high_risk_female.index, y=['Ninth Grade', 'Tenth Grade', 'Eleventh Grade', 'Twelfth Grade'], title="Youth Risk Behaviour")
# Creating a line graph to see trend for all students
line_graph_all_students = px.line(pivot_high_risk_total, x=pivot_high_risk_total.index, y=['Ninth Grade', 'Tenth Grade', 'Eleventh Grade', 'Twelfth Grade'], title="Youth Risk Behaviour")
# Creating the line graph for all students
main_line_graph = px.line(pivot_high_risk_total, x=pivot_high_risk_total.index, y=['Ninth Grade', 'Tenth Grade', 'Eleventh Grade', 'Twelfth Grade'], title="Youth Risk Behaviour")
# Creating dropdownn buttons to see individual line graphs for students (all students, male students, female students)
updatemenus = [
{'buttons': [
{
'method': 'restyle',
'label': 'All High School Students',
'args': [{'y': [data.y for data in line_graph_all_students.data]}]
},
{
'method': 'restyle',
'label': 'Male High School Students',
'args': [{'y': [data.y for data in line_graph_male.data]}]
},
{
'method': 'restyle',
'label': 'Female High School Students',
'args': [{'y': [data.y for data in line_graph_female.data]}]
}
],
'direction': 'down',
'showactive': True,
}
]
# Upadating the layout for the line graph
main_line_graph = main_line_graph.update_layout(
title_text='Change in high risk behavior among youth over the last two decades',
title_x=0.5,
xaxis_showgrid=True,
yaxis_showgrid=True,
hoverlabel=dict(font_size=15, bgcolor='rgb(0,0,139)',
bordercolor= 'Beige'),
yaxis_title = 'High Risk Percentage',
xaxis_title = 'Year',
legend=dict(title='Grade',
x=1,
y=1,
traceorder='normal',
bgcolor='lightblue',
xanchor = 'auto'),
updatemenus=updatemenus
)
# Displaying the created line graph
main_line_graph.show()
Overall, data shows that the risky behvior among youth is decreasing over the years. The pattern is similar for both boys and girls. However, boys have higher risky behvaior value than girls in all grades. We can also see that youth in 12th grade are most likely to be involved in risky behavior followed by 11th, 10th and 9th.
As seen in the line plot there is significant drop in Risky Behvaior from 1997 to 2017. One possible theory for the decrease in trend could be increasing awareness among youth regarding different risky behavior and change in parent child relationship. Parents lately have been monitoring activities of their children more closely. Also youth in early 2000s might have started taking part in extra curricular activties like trying new sports, learning new skills, and engaging in confidence-building activities. Studies conducted by few research professors as found here they have attributed decline in risky behavior to effective public policies (such as anti-smoking programs), closer parent-child relationships, and the social consequences of electronic media use. One possible theory couble that a rise in electronic media use led to a decline in unstructured time with friends, which led in turn to lower risk behavior.
How can government or agencies have already used such data to reduce Youth Risk Behavior?
As cited here State and local agencies and nongovernmental organizations use YRBS data to set school health and health promotion program goals, support modification of school health curricular or other programs, support new legislation and policies that promote health, and seek funding for new initiatives. For example, Hillsborough County, Florida, used YRBS data to enhance health education, physical education, and health science education programs and to create a guide for high school science teachers to use when discussing specific topics related to HIV, STDs, and unintended pregnancies. In Michigan, YRBS data are used to plan and advocate for coordinated school health programs and other health-related initiatives in their state. The San Francisco Unified School District (SFUSD) developed the SFUSD Family Guide, which combines its YRBS data in an easy-to-read form with information on related school health programs, national research, and strategies for promoting health at home.
Conclusions
Overall, according to our analysis, we find that geography affects the percentage of risky behavior in youth. States near the coastline are at greater risk of youth indulging in the above activities as most drug smuggling takes place near ports and harbors.Also, we observe that overall in the youths there is a significant consumption in all the states. Specifically, the risk pertaining to unhealthy sexual behavior is the greatest followed by alcohol/drug use and then by tobacco usage. Only catering to one health risk would not have an impact on other risks, as we have observed from our analysis, the risks do not show any particular cascading effect. Last but not least, data shows that risky behavior among youth is decreasing over the years regardless of gender.
Based on those results, we are able to make some recommendations on how to help teens create healthy behaviors. Since the percentage of engaging in unhealthy sexual behavior is the highest compared to alcohol/drug use and tobacco consumption, we would suggest making health care more accessible for the youth. For example, the government should set up more community health centers for young people to get free HIV/AIDS testing. Therefore, teenagers who get tested can be better aware of their health condition and make wise decisions about sex. By having access to the health center, they will also learn how to prevent HIV and stay away from unhealthy sexual behaviors. In addition, due to the prevalence of risky behaviors among youth, organizations and schools could consider providing more training and prevention programs in improving youth development.
Programs should aim to educate teens about the negative impacts of risky behaviors that they frequently engage in and increase awareness of potential damage to their physical and mental health. With the information that we provided, parents are able to gain a better understanding of common youth risky behaviors that exist in current years. Therefore, they could better monitor their children's behaviors and communicate with love and respect to prevent them from engaging in risky activities. Furthermore, collaborating with content creators such as famous YouTubers or Tiktokers to create educational videos may also help to increase awareness of issues related to youth risky behaviors. Because of the widespread use of computers and smartphones, teenagers spend a lot of time surfing the internet. It will be beneficial to provide some digital educational content available for teenagers to learn and study which may further help them maintain healthy behaviors.
Conclusion-¶Overall, according to our analysis, we find that geography affects the percentage of risky behavior in youth. States near the coastline are at greater risk of youth indulging in the above activities as most drug smuggling takes place near ports and harbors.Also, we observe that overall in the youths there is a significant consumption in all the states. Specifically, the risk pertaining to unhealthy sexual behavior is the greatest followed by alcohol/drug use and then by tobacco usage. Only catering to one health risk would not have an impact on other risks, as we have observed from our analysis, the risks do not show any particular cascading effect. Last but not least, data shows that risky behavior among youth is decreasing over the years regardless of gender.
Based on those results, we are able to make some recommendations on how to help teens create healthy behaviors. Since the percentage of engaging in unhealthy sexual behavior is the highest compared to alcohol/drug use and tobacco consumption, we would suggest making health care more accessible for the youth. For example, the government should set up more community health centers for young people to get free HIV/AIDS testing. Therefore, teenagers who get tested can be better aware of their health condition and make wise decisions about sex. By having access to the health center, they will also learn how to prevent HIV and stay away from unhealthy sexual behaviors.
In addition, due to the prevalence of risky behaviors among youth, organizations and schools could consider providing more training and prevention programs in improving youth development. Programs should aim to educate teens about the negative impacts of risky behaviors that they frequently engage in and increase awareness of potential damage to their physical and mental health. With the information that we provided, parents are able to gain a better understanding of common youth risky behaviors that exist in current years. Therefore, they could better monitor their children's behaviors and communicate with love and respect to prevent them from engaging in risky activities. Furthermore, collaborating with content creators such as famous YouTubers or Tiktokers to create educational videos may also help to increase awareness of issues related to youth risky behaviors. Because of the widespread use of computers and smartphones, teenagers spend a lot of time surfing the internet. It will be beneficial to provide some digital educational content available for teenagers to learn and study which may further help them maintain healthy behaviors.
References-¶Borodovsky, J. T., Krueger, R. F., Agrawal, A., & Grucza, R. A. (2019). A decline in propensity toward risk behaviors among US adolescents. Journal of Adolescent Health, 65(6), 745-751.
Eaton, D. K., Kann, L., Kinchen, S., Shanklin, S., Ross, J., Hawkins, J., ... & Centers for Disease Control and Prevention (CDC). (2008). Youth risk behavior surveillance—United States, 2007. MMWR Surveill Summ, 57(4), 1-131.
Gruber, J. (2001). Introduction to" Risky Behavior among Youths: An Economic Analysis". In Risky Behavior Among Youths: An Economic Analysis (pp. 1-28). University of Chicago Press.
Healthy For Whom? Teen STD rates soar on Salt Lake City's west side. (January 16, 2011). Centerforhealthjournalism. Retrieved Dec 4 from https://centerforhealthjournalism.org/fellowships/projects/healthy-whom-teen-std-rates-soar-salt-lake-citys-west-side
Louisiana spends less than 3% of tobacco revenue on anti-smoking programs, earning failing grades.(FEB 9, 2020) Retrieved (Dec 4, 2020) https://www.nola.com/news/healthcare_hospitals/louisiana-spends-less-than-3-of-tobacco-revenue-on-anti-smoking-programs-earning-failing-grades/article_b2109850-49f6-11ea-b8c2-0f9482e1ea21.html
National Drug Intelligence Center Florida Drug Threat Assessment. (July 2003). Justice.gov. Retrieved Dec 5 2022 from https://www.justice.gov/archive/ndic/pubs5/5169/overview.htm
Scientists Find a Connection Between 9/11 and Substance Abuse. (2007). Headsupscholastic Retrieved Dec 5 2022 from https://headsup.scholastic.com/students/scientists-find-a-connection-between-911-and-substance-abuse/#:~:text=There%20was%20also%20indication%20of,these%20substances%20before%209%2F11.
United States. Public Health Service. Office of the Surgeon General, National Center for Chronic Disease Prevention, & Health Promotion (US). Office on Smoking. (2012). Preventing tobacco use among youth and young adults: A report of the surgeon general. US Government Printing Office.